Search CORE

10 research outputs found

An Italian to Catalan RBMT system reusing data from existing language pairs

Author: Ginestí-Rosell Mireia
Toral Antonio
Tyers Francis
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents an Italian! Catalan RBMT system automatically built by combining the linguistic data of the existing pairs Spanish–Catalan and Spanish–Italian. A lightweight manual postprocessing is carried out in order to fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of both TER and GTM

DCU Online Research Access Service

La traducció automàtica en la pràctica: aplicacions, dificultats i estratègies de desenvolupament

Author: Forcada Zubizarreta Mikel L.
Ginestí Rosell Mireia
Publication venue
Publication date: 01/01/2009
Field of study

En aquest article es descriuen els sistemes de traducció automàtica, les seves aplicacions actuals i les principals dificultats que ha d’afrontar aquesta tecnologia lingüística. Es presenta el sistema Apertium, una plataforma de traducció automàtica de codi obert sobre la qual s’han construït diversos traductors automàtics entre diferents parells d’idiomes, en els quals està inclòs el català. Basant-se en l’experiència dels autors, es descriuen algunes tensions que es donen en el desenvolupament de les dades lingüístiques d’un traductor automàtic i les solucions de compromís a què cal arribar per a construir sistemes útils

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Revistes Catalanes amb Accés Obert

Scipedia

Joint efforts to further develop and incorporate Apertium into the document management flow at Universitat Oberta de Catalunya

Author: Ginestí Rosell Mireia
Ortiz Rojas Sergio
Villarejo Muñoz Luis
Publication venue: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Publication date: 01/01/2009
Field of study

This article describes the needs of UOC regarding translation and how these needs are satisfied by Prompsit further developing a free rule-based machine translation system: Apertium. We initially describe the general framework regarding linguistic needs inside UOC. Then, section 2 introduces Apertium and outlines the development scenario that Prompsit executed. After that, section 3 outlines the specific needs of UOC and why Apertium was chosen as the machine translation engine. Then, section 4 describes some of the features specially developed in this project. Section 5 explains how the linguistic data was improved to increase the quality of the output in Catalan and Spanish. And, finally, we draw conclusions and outline further work originating from the project

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

An Italian to Catalan RBMT system reusing data from existing language pairs

Author: Ginestí-Rosell Mireia
Toral Antonio
Tyers Francis
Publication venue: 'Fundacio per la Universitat Oberta de Catalunya'
Publication date: 01/01/2011
Field of study

This paper presents an Italian to Catalan RBMT system automatically built by combining the linguistic data of the existing pairs Spanish-Catalan and Spanish-Italian. A lightweight manual postprocessing is carried out in order to fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of both TER and GTM.Aquest article presenta un sistema de traducció automàtica basat en regles de l'italià al català construït automàticament combinant les dades lingüístiques dels parells espanyol-català i espanyol-italià existents. Es duu a terme un postprocessament manual superficial per a corregir incoherències en els diccionaris derivats automàticament i per a afegir-hi paraules molt freqüents que no hi són d'acord amb una anàlisi del corpus. El sistema s'avalua en el corpus KDE4 i supera Google Translate aproximadament per deu punts absoluts tant pel que fa al TER (índex d'edició de traducció) com pel que fa al GTM (mètode de traducció gramàtica).Este artículo presenta un sistema de traducción automática basado en reglas del italiano al catalán construido mediante la combinación de datos lingüísticos de los pares existentes español-catalán y español-italiano. Se lleva a cabo un postprocesamiento manual superficial para corregir incoherencias en los diccionarios derivados automáticamente y para añadir palabras muy frecuentes que no están en ellos según un análisis del corpus. El sistema se evalúa en el corpus KDE4 y supera a Google Translate aproximadamente por diez puntos absolutos tanto por lo que respecta al TER (índice de edición de traducción) como por lo que respecta al GTM (método de traducción gramática)

CiteSeerX

The Oberta in open access

DCU Online Research Access Service

An Italian to Catalan RBMT system reusing data from existing language pairs

Author: Ginestí-Rosell Mireia
Toral Antonio
Tyers Francis
Publication venue: 'Fundacio per la Universitat Oberta de Catalunya'
Publication date
Field of study

RECERCAT

Desarrollo de un sistema libre de traducción automática del euskera al castellano

Author: Forcada Mikel L.
Ginestí Rosell Mireia
Ortiz Rojas Sergio
Ramírez Sánchez Gema
Tyers Francis M.
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2009
Field of study

Este artículo presenta un sistema de traducción automática libre (de código abierto) basado en reglas entre euskera y castellano, construido sobre la plataforma de traducción automática Apertium y pensado para la asimilación, es decir, como ayuda a la comprensión de textos escritos en euskera. Se describe el desarrollo y la situación actual y se muestra una evaluación de la calidad de las traducciones.This paper presents a free (or open-source) rule-based machine translation system between Basque and Spanish, based on the Apertium machine translation platform aimed at assimilation, that is, as a help for the understanding of texts written in Basque. The development process and current status are described and an evaluation is given of the utility of the output.Development was supported and funded by Prompsit Language Engineering S.L. and the Universitat d’Alacant

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Italian-Catalan LMF Apertium Bilingual dictionary

Author: Ginestí Rosell Mireia
Toral Antonio
Tyers Francis M.
Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
Publication venue: Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
Publication date: 10/10/2011
Field of study

This is the LMF version of the Apertium bilingual dictionary for Italian and Catalan languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs

UPF Digital Repository

An open-source shallow-transfer machine translation toolbox: consequences of its release and availability

Author: Armentano Oller Carme
Bonev Boyan
Corbí Bellot Antonio Miguel
Forcada Mikel L.
Ginestí Rosell Mireia
Ortiz Rojas Sergio
Pérez-Ortiz Juan Antonio
Ramírez Sánchez Gema
Sánchez-Martínez Felipe
Publication venue: OSMaTran
Publication date: 01/09/2005
Field of study

By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine itself, a modular shallow-transfer machine translation engine suitable for related languages and largely based upon that of systems we have already developed, such as interNOSTRUM for Spanish—Catalan and Traductor Universia for Spanish—Portuguese, (b) extensive documentation (including document type declarations) specifying the XML format of all linguistic (dictionaries, rules) and document format management files, (c) compilers converting these data into the high-speed (tens of thousands of words a second) format used by the engine, and (d) pilot linguistic data for Spanish—Catalan and Spanish—Galician and format management specifications for the HTML, RTF and plain text formats. After describing very briefly this toolbox, this paper aims at exploring possible consequences of the availability of this architecture, including the community-driven development of machine translation systems for languages lacking this kind of linguistic technology.The development of the toolbox is funded by project FIT-340101-2004-3 (Spanish Ministry of Industry, Commerce and Tourism)

Repositorio Institucional de la Universidad de Alicante

Apertium, una plataforma de código abierto para el desarrollo de sistemas de traducción automática

Author: Armentano Oller Carme
Corbí Bellot Antonio Miguel
Forcada Mikel L.
Ginestí Rosell Mireia
Montava Belda Marco A.
Ortiz Rojas Sergio
Pérez-Ortiz Juan Antonio
Ramírez Sánchez Gema
Sánchez-Martínez Felipe
Publication venue: Universidad de Cádiz. Servicio de Publicaciones
Publication date: 01/01/2007
Field of study

Uno de los principales retos de la informática para las próximas décadas es el desarrollo de sistemas capaces de procesar eficazmente el lenguaje natural (o lenguaje humano). Dentro de este campo, los sistemas de traducción automática, encargados de traducir un texto escrito en un idioma a una versión equivalente en otro idioma, reciben especial atención dado, por ejemplo, el carácter multilingüe de sociedades como la europea. La automatización de dicho proceso es particularmente compleja porque los programas han de enfrentarse a características del lenguaje natural, como la ambigüedad, cuyo tratamiento algorítmico no es factible, de modo que una mera aproximación o automatización parcial del proceso ya se considera un éxito. Los programas de traducción automática han sido tradicionalmente sistemas cerrados, pero en los últimos tiempos la tendencia marcada por el software libre ha llegado también a este campo. En este artículo describimos Apertium, apertium.org, una plataforma avanzada de código abierto, con licencia GNU GPL, que, gracias al desacoplamiento que ofrece entre datos y programas permite desarrollar cómodamente nuevos traductores automáticos. La plataforma Apertium ha sido desarrollada por el grupo de investigación Transducens de la Universitat d’Alacant en el marco de varios proyectos de colaboración con universidades y empresas de España en los que, además de los programas que conforman el motor de traducción, se han confeccionado datos lingüísticos abiertos para la traducción automática catalán–español, gallego–español, portugués–español, francés–catalán, inglés–catalán y occitano–catalán. Tanto la plataforma en la que se integra el motor de traducción como los datos para estos pares de lenguas están disponibles para su descarga en sf.net/projects/apertium/ y para su evaluación en línea en xixona.dlsi.ua.es/prototype/.Este trabajo ha sido parcialmente subvencionado por el Ministerio de Industria, Comercio y Turismo a través de los proyectos FIT-340101-2004-3, FIT-340001-2005-2 y FIT-350401-2006-5, por el Ministerio de Educación y Ciencia a través de los proyectos TIC2003-08681-C02-01 y TIN2006-15071-C03-01, y por la Generalitat de Catalunya a través del proyecto DURSI1-05I. Felipe Sánchez-Martínez disfruta de la ayuda para la formación de personal investigador BES-2004-4711, financiada por el Fondo Social Europeo y el Ministerio de Educación y Ciencia

Repositorio Institucional de la Universidad de Alicante

Open-source Portuguese-Spanish machine translation

Author: Antonio M. Corbí-bellot
Carme Armentano-oller
Felipe Sánchez-martínez
Gema Ramírez-sánchez
Juan Antonio Pérez-ortiz
Mikel L. Forcada
Mireia Ginestí-rosell
Miriam A. Scalco
Rafael C. Carrasco
Sergio Ortiz-rojas
Publication venue: Springer-Verlag
Publication date: 01/01/2006
Field of study

Abstract. This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese ↔ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state-based chunking for structural transfer, and is based on a simple rationale: to produce fast, reasonably intelligible and easily correctable translations between related languages, it suffices to use a MT strategy which uses shallow parsing techniques to refine word-for-word MT. This paper briefly describes the MT engine, the formats it uses for linguistic data, and the compilers that convert these data into an efficient format used by the engine, and then goes on to describe in more detail the pilot Portuguese↔Spanish linguistic data.

CiteSeerX

Crossref